Matching and maximizing are two ends of a spectrum of policy search algorithms January 2 , 2004

نویسنده

  • Sebastian Seung
چکیده

According to the matching law, when an animal makes many repeated choices between alternatives, its preferences are in the ratio of the incomes derived from the alternatives. Because matching behavior does not maximize reward, it has been difficult to explain using optimal foraging theory or rational choice theory. Here I show that matching and maximizing can be regarded as two ends of a spec­ trum of policy search algorithms from reinforcement learning. The algorithms are parametrized by the time horizon within which past choices are correlated with present reward. Maximization corresponds to the case of a long time horizon, while matching corresponds to a short horizon. From this viewpoint, matching is an approximation to maximizing, with the advantage of faster learning and more robust performance in nonstationary environments. Between these two ends of the spectrum lie many strategies intermediate between matching and maximizing. If an animal’s relative preferences for alternatives are in the ratio of the incomes derived from them, then its behavior is said to be “matching.” Matching behavior has been observed for certain types of reinforcement schedules, in particular those that randomize the interval between reward. The matching law was important because it gave the law of effect a quantitative formulation. Given the matching law as an empirical observation about behavior, two questions immediately come to mind. The first question is functional: why is matching a good policy for animals to follow? The second is mechanistic: what neural mechanisms un­ derlie the production of matching behavior? This note mainly addresses the first ques­ tion, by elucidating the function of matching from the viewpoint of the mathematical theory of reinforcement learning. However, the second question is also peripherally ad­ dressed through mathematical developments that are shared by recent neural network models of matching behavior. One of the most common ways to explain the function of a behavior is to argue that it has been adapted by evolution to be optimal. Such an explanation for matching has been elusive, because matching does not generally maximize the animal’s overall rate of reward. In this respect, the matching law is suboptimal. This enables it to be used as an explanation for “irrational” human behaviors, such as addiction and other behaviors attributed to lack of “self­control.” Nevertheless, it would be hasty to completely reject optimality as an explanation of matching. Often matching is close to optimal, even if it is not exactly so.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stock Portfolio Optimization Using Water Cycle Algorithm (Comparative Approach)

Portfolio selection process is a subject focused by many researchers. Various criteria involved in this process have undergone alterations over time, necessitating the use of appropriate investment decision support tools. An optimization approach used in different sciences is using meta-heuristic algorithms. In the present study, using Water Cycle Algorithm (WCA), a model was introduced for sel...

متن کامل

Solving a Multi-Item Supply Chain Network Problem by Three Meta-heuristic Algorithms

The supply chain network design not only assists organizations production process (e.g.,plan, control and execute a product’s flow) but also ensure what is the growing need for companies in a longterm. This paper develops a three-echelon supply chain network problem including multiple plants, multiple distributors, and multiple retailers with amulti-mode demand satisfaction policy inside of pro...

متن کامل

A New RSTB Invariant Image Template Matching Based on Log-Spectrum and Modified ICA

Template matching is a widely used technique in many of image processing and machine vision applications. In this paper we propose a new as well as a fast and reliable template matching algorithm which is invariant to Rotation, Scale, Translation and Brightness (RSTB) changes. For this purpose, we adopt the idea of ring projection transform (RPT) of image. In the proposed algorithm, two novel s...

متن کامل

Single-Setup-Multiple-Deliveries for a Single Supplier-Single Buyer with Single Product and Backorder

  This article investigates integrated production-inventory models with backorder. A single supplier and a single buyer are considered and shortage as backorder is allowed for the buyer. The proposed models determine optimal order quantity, optimal backorder quantity and optimal number of deliveries on the joint total cost for both buyer and supplier. Two cases are discussed: single-setup-singl...

متن کامل

FORECASTING TRANSPORT ENERGY DEMAND IN IRAN USING META-HEURISTIC ALGORITHMS

This paper presents application of an improved Harmony Search (HS) technique and Charged System Search algorithm (CSS) to estimate transport energy demand in Iran, based on socio-economic indicators. The models are developed in two forms (exponential and linear) and applied to forecast transport energy demand in Iran. These models are developed to estimate the future energy demands based on pop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004